<html>
<head><meta charset="utf-8"><title>dot product · project-portable-simd · Zulip Chat Archive</title></head>
<h2>Stream: <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/index.html">project-portable-simd</a></h2>
<h3>Topic: <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html">dot product</a></h3>

<hr>

<base href="https://rust-lang.zulipchat.com">

<head><link href="https://rust-lang.github.io/zulip_archive/style.css" rel="stylesheet"></head>

<a name="217717985"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217717985" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217717985">(Nov 24 2020 at 06:16)</a>:</h4>
<p>This may be out of scope for the WG, but I'm concerned that there is no good way to implement a dot product. <code>packed_simd</code> has an example:</p>
<div class="codehilite" data-code-language="Rust"><pre><span></span><code><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">dot_prod</span><span class="p">(</span><span class="n">a</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">f32</span><span class="p">],</span><span class="w"> </span><span class="n">b</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">f32</span><span class="p">])</span><span class="w"> </span>-&gt; <span class="kt">f32</span> <span class="p">{</span><span class="w"></span>
<span class="w">    </span><span class="n">assert_eq</span><span class="o">!</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">len</span><span class="p">(),</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">len</span><span class="p">());</span><span class="w"></span>
<span class="w">    </span><span class="n">assert</span><span class="o">!</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"></span>

<span class="w">    </span><span class="n">a</span><span class="p">.</span><span class="n">chunks_exact</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span><span class="w"></span>
<span class="w">        </span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">f32x4</span>::<span class="n">from_slice_unaligned</span><span class="p">)</span><span class="w"></span>
<span class="w">        </span><span class="p">.</span><span class="n">zip</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">chunks_exact</span><span class="p">(</span><span class="mi">4</span><span class="p">).</span><span class="n">map</span><span class="p">(</span><span class="n">f32x4</span>::<span class="n">from_slice_unaligned</span><span class="p">))</span><span class="w"></span>
<span class="w">        </span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="o">|</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">)</span><span class="o">|</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">b</span><span class="p">)</span><span class="w"></span>
<span class="w">        </span><span class="p">.</span><span class="n">sum</span>::<span class="o">&lt;</span><span class="n">f32x4</span><span class="o">&gt;</span><span class="p">()</span><span class="w"></span>
<span class="w">        </span><span class="p">.</span><span class="n">sum</span><span class="p">()</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>This has many shortcomings compared to a reference C implementation:</p>
<div class="codehilite" data-code-language="C"><pre><span></span><code><span class="kt">float</span> <span class="nf">dot</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">n</span><span class="p">,</span> <span class="k">const</span> <span class="kt">float</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="k">const</span> <span class="kt">float</span> <span class="o">*</span><span class="n">b</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">float</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="cp">#pragma omp simd reduction(+:sum)</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">&lt;</span><span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">sum</span> <span class="o">+=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*</span> <span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">sum</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p><a href="https://godbolt.org/z/cq6WT5">https://godbolt.org/z/cq6WT5</a></p>
<ol>
<li>Only works for vector lengths that are multiples of 4 (or whatever vector length for the trivial generalization).</li>
<li>No FMA (after possible fringe so that one can operate in aligned chunks because AVX FMA with mem argument requires alignment)</li>
<li>Only one accumulator register (poor ILP)</li>
</ol>
<p>One could write a general-purpose <code>dot</code> using packed_simd, but it would have very high complexity and likely not be performance portable. Unlike some fancy SIMD kernels, this is code that compilers are very capable of generating, but AFAIK, not possible to obtain using Rust. Are there any parallel efforts that would enable this type of optimization? Roughly: it's okay to use fused multiply-add (<a href="https://github.com/rust-lang/rfcs/pull/2686">https://github.com/rust-lang/rfcs/pull/2686</a>) and it's okay to change summation associativity in this way.</p>
<p>If the answer is that someone needs to drive this forward, do you have any thoughts on what would be involved/where the roadblocks lie?</p>



<a name="217718267"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718267" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718267">(Nov 24 2020 at 06:23)</a>:</h4>
<p>I think the scope of OMP is slightly different from stdsimd, but you could build something on top of stdsimd that is similarly width-agnostic.</p>



<a name="217718363"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718363" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718363">(Nov 24 2020 at 06:25)</a>:</h4>
<p>Is your concern (other than the width aspect) the lack of FMA operations? Because I expect those to eventually be implemented</p>



<a name="217718386"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718386" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Mario Carneiro <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718386">(Nov 24 2020 at 06:25)</a>:</h4>
<blockquote>
<p>Only works for vector lengths that are multiples of 4 (or whatever vector length for the trivial generalization).</p>
</blockquote>
<p>I consider this a pro rather than a con, an advantage of manual vectorization. The extra slop code in the C has a cost and the compiler can almost never get rid of it, while in practical use cases it's often easier to ensure this property at the more global level by program design</p>



<a name="217718470"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718470" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718470">(Nov 24 2020 at 06:27)</a>:</h4>
<p>C can use annotation or the likes of <code>n = (n/4)*4</code> to convey alignment and lengths if one has built that into the program. I agree that heavy fringe code is sometimes pessimal.</p>



<a name="217718523"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718523" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718523">(Nov 24 2020 at 06:28)</a>:</h4>
<p>I also want to point out that I have experienced issues with C++ compilers being too cavalier with FMA, producing incorrect results somewhere that was sensitive to it, so I personally prefer needing to be explicit about FMA, at least at the level of abstraction that stdsimd is at</p>



<a name="217718549"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718549" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718549">(Nov 24 2020 at 06:29)</a>:</h4>
<p>Lack of FMA is a big one, but that doesn't convey the permissive associativity that enables changing the number of accumulators to cover instruction latency.</p>



<a name="217718641"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718641" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718641">(Nov 24 2020 at 06:31)</a>:</h4>
<p>Reading <code>2686</code> above, I get the impression that anything where optimization level can change the result will be met with some resistance, but that moves a lot of common (important to numerical/science libs/apps) into manual optimization territory.</p>



<a name="217718804"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718804" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Mario Carneiro <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718804">(Nov 24 2020 at 06:34)</a>:</h4>
<p>is it possible to <code>#[cfg]</code> on the optimization level?</p>



<a name="217718815"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718815" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718815">(Nov 24 2020 at 06:35)</a>:</h4>
<p>I don't think this is unique to dot products, I think it's true for any sort of reduction. I've definitely experienced it writing FFTs, and I think it comes down to the particular application</p>



<a name="217718826"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718826" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718826">(Nov 24 2020 at 06:35)</a>:</h4>
<p>Unless I'm confused depending on the input data changing the associativity can have a huge impact (though maybe not in the most common cases)</p>



<a name="217718828"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718828" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718828">(Nov 24 2020 at 06:35)</a>:</h4>
<p><span class="user-mention silent" data-user-id="312331">Caleb Zulawski</span> <a href="#narrow/stream/257879-project-portable-simd/topic/dot.20product/near/217718267">said</a>:</p>
<blockquote>
<p>I think the scope of OMP is slightly different from stdsimd, but you could build something on top of stdsimd that is similarly width-agnostic.</p>
</blockquote>
<p>My concern is that any such abstraction would end up manually implementing lots of things that the compiler is good/better at. It makes sense to implement things that the compiler is bad at, but it seems backward to tediously optimize things that the compiler is good at because we don't allow the compiler to do it as a matter of principle.</p>



<a name="217718914"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217718914" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217718914">(Nov 24 2020 at 06:37)</a>:</h4>
<p>Yeah, nothing special about dot products, it's just the simplest case on hand. And yes, changing associativity absolutely <em>does</em> change the results, but if the mathematical operation is a reduction then there is no reason a priori to prefer one ordering over another. All are equally good (or bad). If you sort the data or use fancy (slower) summation tricks, then one order may indeed be important.</p>



<a name="217719010"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719010" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719010">(Nov 24 2020 at 06:39)</a>:</h4>
<p>If <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi></mrow><annotation encoding="application/x-tex">P</annotation></semantics></math></span><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.13889em;">P</span></span></span></span> is a permutation, then <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><mi>P</mi><mi>x</mi><msup><mo stretchy="false">)</mo><mi>T</mi></msup><mo stretchy="false">(</mo><mi>P</mi><mi>y</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(P x)^T (P y)</annotation></semantics></math></span><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.0913309999999998em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="mord mathnormal">x</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="mord mathnormal" style="margin-right:0.03588em;">y</span><span class="mclose">)</span></span></span></span> is just as valid as <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>x</mi><mi>T</mi></msup><mi>y</mi></mrow><annotation encoding="application/x-tex">x^T y</annotation></semantics></math></span><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.035771em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.03588em;">y</span></span></span></span>. Mathematically, they are identical, but they'll generally be different when computed numerically. Insisting on a particular ordering is just losing performance with no benefit beyond bitwise reproducibility.</p>



<a name="217719067"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719067" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719067">(Nov 24 2020 at 06:40)</a>:</h4>
<p>I'm not sure it's a matter of preferring a particular ordering, since any particular bit of Rust will already have a particular ordering.  There are definitely situations where a particular ordering may be slower but is necessary for stability or some other reason</p>



<a name="217719088"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719088" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719088">(Nov 24 2020 at 06:41)</a>:</h4>
<p>Many climate modeling groups (and some other domains) have requirements for bitwise reproducibility, but they usually freeze the compiler options and other aspects of (multi-process and threaded) parallelism.</p>



<a name="217719218"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719218" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719218">(Nov 24 2020 at 06:44)</a>:</h4>
<p>One of my first questions is if it's even possible to indicate to LLVM that it's ok to do optimizations like that. If it is, I think it would be more appropriate for that to be a separate codegen attribute that doesn't necessarily even have anything to do with stdsimd and would apply to scalar code as well</p>



<a name="217719220"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719220" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719220">(Nov 24 2020 at 06:44)</a>:</h4>
<p>Order matters if you sorted the elements intentionally to reduce rounding error. But an iterative linear algebra library (for example) uses dot products heavily and won't have done such things. In practice, if you're up against stability limits, it's more common to promote to f64 for the summation (and allow the compiler to change associativity to improve SIMD utilization).</p>



<a name="217719313"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719313" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719313">(Nov 24 2020 at 06:46)</a>:</h4>
<p>I wouldn't say it only affects sorted data. You may use a particular order in a classification algorithm to avoid a bias</p>



<a name="217719314"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719314" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719314">(Nov 24 2020 at 06:46)</a>:</h4>
<p>Yep, that's what I'd like (and would allow lots of scientific software to avoid/reduce direct use of core::simd). Clang processes the <code>#pragma omp simd reduction(+:sum)</code> so it must be possible for rustc to tell llvm the same thing.</p>



<a name="217719469"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719469" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719469">(Nov 24 2020 at 06:49)</a>:</h4>
<p>Sure, but it implies you've organized data around this particular operation (and you could have arranged it for a 16-lane accumulator, for example). I realize I'm proposing to let the compiler choose the accumulator width, so this is a different use case.</p>



<a name="217719716"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719716" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719716">(Nov 24 2020 at 06:54)</a>:</h4>
<p>Note that BLAS <code>*dot</code> does not specify the ordering and the rest of the linear algebra stack (QR/Eigensolve/SVD) is fine with that. Same applies to parallel libraries like PETSc. Needing dot products and similar reductions in a particular ordering is the exception rather than the rule. And I'm not arguing to open associativity math everywhere, just that there is significant demand for it to be available without tedious manual SIMD. (Obviously dot product is sufficiently important that we could do the manual SIMD, but it's a major disincentive to writing such software if you have to do it for every perf-critical thing.)</p>



<a name="217719895"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719895" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719895">(Nov 24 2020 at 06:58)</a>:</h4>
<p>If we can find a particular way of instructing LLVM to change just fp associativity I think it would be appropriate to add a codegen attribute.  At that point it's completely opt-in</p>



<a name="217719921"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719921" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719921">(Nov 24 2020 at 06:59)</a>:</h4>
<p>As for attributes, my personal preference would be to be able to enable at block/function/crate granularity and perhaps provide wrapper types for strict ordering.</p>



<a name="217719982"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719982" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719982">(Nov 24 2020 at 07:00)</a>:</h4>
<p>I think a lot of that depends entirely on what LLVM provides us</p>



<a name="217719986"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217719986" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217719986">(Nov 24 2020 at 07:01)</a>:</h4>
<p>Cool, are there any related attributes that would be good to look at? I can look into what Clang is passing to LLVM for this sort of thing.</p>



<a name="217720075"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720075" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720075">(Nov 24 2020 at 07:03)</a>:</h4>
<p>Attributes like "cold" and "inline" get passed directly to the backend for codegen as far as I know. Target features are another one but affect lowering as well</p>



<a name="217720091"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720091" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720091">(Nov 24 2020 at 07:03)</a>:</h4>
<p>I don't know enough about openmp to have any idea what that uses, though, as far as LLVM IR goes</p>



<a name="217720145"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720145" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720145">(Nov 24 2020 at 07:04)</a>:</h4>
<p>I suppose you could just generate some with clang and see what it spits out?</p>



<a name="217720294"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720294" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720294">(Nov 24 2020 at 07:08)</a>:</h4>
<p>With <code>-S -emit-llvm</code>, but maybe you're interested in an earlier stage?<br>
<a href="https://godbolt.org/z/MG6vdz">https://godbolt.org/z/MG6vdz</a></p>



<a name="217720491"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720491" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720491">(Nov 24 2020 at 07:12)</a>:</h4>
<p>Dropping to <code>-O0</code> still yields <code>fadd contract</code>, and that goes away if I drop <code>-ffp-contract=fast</code>. The <code>fadd contract</code> and <code>fmul contract</code> are issued even for <code>-march=x86-64</code> (which has no such instruction), so it looks like those are just the way of conveying that contraction is allowed.</p>



<a name="217720517"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720517" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720517">(Nov 24 2020 at 07:13)</a>:</h4>
<p>Yeah, it's not clear what it's doing regarding associativity, even without optimizations</p>



<a name="217720520"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720520" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720520">(Nov 24 2020 at 07:13)</a>:</h4>
<p><span class="user-mention silent" data-user-id="312331">Caleb Zulawski</span> <a href="#narrow/stream/257879-project-portable-simd/topic/dot.20product/near/217719218">said</a>:</p>
<blockquote>
<p>One of my first questions is if it's even possible to indicate to LLVM that it's ok to do optimizations like that. If it is, I think it would be more appropriate for that to be a separate codegen attribute that doesn't necessarily even have anything to do with stdsimd and would apply to scalar code as well</p>
</blockquote>
<p><a href="http://llvm.org/docs/LangRef.html#fast-math-flags">http://llvm.org/docs/LangRef.html#fast-math-flags</a> -- it looks like <code>reassoc</code> is the one you'd want for a dot product</p>



<a name="217720572"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720572" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720572">(Nov 24 2020 at 07:14)</a>:</h4>
<p>That seems to be it!</p>



<a name="217720644"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720644" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720644">(Nov 24 2020 at 07:16)</a>:</h4>
<p>It looks like these actually apply to all floating point ops individually, so perhaps this is to some extent a stdsimd issue (since the simd intrinsics would need to support it as well)</p>



<a name="217720654"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720654" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720654">(Nov 24 2020 at 07:16)</a>:</h4>
<p>But still handled at codegen</p>



<a name="217720664"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720664" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720664">(Nov 24 2020 at 07:17)</a>:</h4>
<p>Thanks for that. Should <code>reassoc</code> be visible in the IR here?<br>
<a href="https://godbolt.org/z/vfMjnz">https://godbolt.org/z/vfMjnz</a></p>



<a name="217720713"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720713" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720713">(Nov 24 2020 at 07:18)</a>:</h4>
<p>I don't know what "<code>-ffp-contract=fast</code>" does.  You might have forced it to use the stronger <code>fast</code> flag, with that.</p>



<a name="217720732"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720732" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720732">(Nov 24 2020 at 07:19)</a>:</h4>
<p>I would have expected to see it but it's definitely not there. I wonder if perhaps OMP doesn't actually allow it and something else is going on? (Though it's probably still a useful codegen attribute to provide...)</p>



<a name="217720734"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720734" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720734">(Nov 24 2020 at 07:19)</a>:</h4>
<p>That example has <code>fmul contract</code> and <code>fadd contract</code>, which is a different flag</p>



<a name="217720735"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720735" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720735">(Nov 24 2020 at 07:19)</a>:</h4>
<p><code>-ffp-contract=fast</code> is exactly the <code>contract</code> flag. It's allowed by default by gcc (and Intel, which allows way more).</p>



<a name="217720737"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720737" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720737">(Nov 24 2020 at 07:19)</a>:</h4>
<p>Yeah, I only see contract</p>



<a name="217720777"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720777" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720777">(Nov 24 2020 at 07:20)</a>:</h4>
<p>And <code>contract</code> is explicitly documented as _not_ being about reassociating</p>



<a name="217720864"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720864" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720864">(Nov 24 2020 at 07:22)</a>:</h4>
<p>So I'm not sure what you're trying to get it to do, here.  Are you just trying to get <code>fma</code>s?</p>



<a name="217720867"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720867" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720867">(Nov 24 2020 at 07:22)</a>:</h4>
<p>Ah, the <code>omp simd</code> is explicitly unrolling, and the unroll amount depends on the target arch.</p>



<a name="217720949"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217720949" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217720949">(Nov 24 2020 at 07:24)</a>:</h4>
<p><span class="user-mention" data-user-id="125270">@scottmcm</span> FMA + multiple accumulators to use packed registers and cover instruction latency (4-5 cycles per FMA; 2 FMAs can be issued per cycle, but usually only one with a load from memory).</p>



<a name="217721035"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217721035" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217721035">(Nov 24 2020 at 07:26)</a>:</h4>
<p>I'd skip the <code>omp simd</code> optimization and just go for a way to indicate that the accumulator can add the <code>assoc</code> attribute. Maybe a type wrapper would be good for that.</p>



<a name="217721053"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217721053" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217721053">(Nov 24 2020 at 07:27)</a>:</h4>
<p><code>iter().sum::&lt;Assoc&lt;f32&gt;&gt;()</code> or some such.</p>



<a name="217721125"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217721125" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217721125">(Nov 24 2020 at 07:28)</a>:</h4>
<p>Thanks for this discussion. I'll look into the code some in the morning.</p>



<a name="217723302"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217723302" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217723302">(Nov 24 2020 at 08:06)</a>:</h4>
<p>Hmm, the autovectorizer will <code>vfmadd213ps</code>, but it seems to do a bunch of weird permuting: <a href="https://rust.godbolt.org/z/Me18rn">https://rust.godbolt.org/z/Me18rn</a></p>



<a name="217757428"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217757428" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217757428">(Nov 24 2020 at 14:04)</a>:</h4>
<p>Shuffles gone with a slight indexing adjustment. <a href="https://rust.godbolt.org/z/4YhT1x">https://rust.godbolt.org/z/4YhT1x</a></p>



<a name="217759919"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217759919" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Lokathor <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217759919">(Nov 24 2020 at 14:23)</a>:</h4>
<p>The fundamental problem here is storing xyzw in your f32x4. You should be storing xxxx and then have yyyy be another vector and so on. Then you get four dot products at "full" speed.</p>



<a name="217765599"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217765599" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217765599">(Nov 24 2020 at 15:05)</a>:</h4>
<p>I completely agree that batching is good when you have it, but there are plenty of times when you don't have lots of instances of the same problem. This happens when using adaptive methods and for linear solvers in optimization and simulation. Sometimes the vectors are much too large to fit in cache, in which case almost any old dot product will operate at memory bandwidth. But when they fit in cache, it's important to be able to make it fast.</p>



<a name="217809461"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217809461" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217809461">(Nov 24 2020 at 20:31)</a>:</h4>
<p>I've just learned of this way to produce <code>fadd fast</code> IR (<code>fast</code> includes <code>contract</code>, <code>reassoc</code>, and several others).<br>
<a href="https://doc.rust-lang.org/core/intrinsics/fn.fadd_fast.html">https://doc.rust-lang.org/core/intrinsics/fn.fadd_fast.html</a><br>
I could see adding <code>fadd_reassoc</code> and <code>fadd_contract</code>, though the combinatorial numer of policies gets out of hand pretty quick (<code>fadd_reassoc_contract_arcp</code>?).  For ergonomics, there seem to be two strategies:</p>
<ol>
<li>attributes to add flags to all fp primitives within a block</li>
<li>wrapper types to implement <code>Add</code> using <code>fadd_reassoc</code> and the like</li>
</ol>
<p>I hacked up an option 2 prototype using <code>fadd_fast</code> and have</p>
<div class="codehilite" data-code-language="Rust"><pre><span></span><code><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">dot</span><span class="p">(</span><span class="n">a</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">f32</span><span class="p">],</span><span class="w"> </span><span class="n">b</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">f32</span><span class="p">])</span><span class="w"> </span>-&gt; <span class="kt">f32</span> <span class="p">{</span><span class="w"></span>
<span class="w">    </span><span class="n">a</span><span class="p">.</span><span class="n">iter</span><span class="p">()</span><span class="w"></span>
<span class="w">        </span><span class="p">.</span><span class="n">zip</span><span class="p">(</span><span class="n">b</span><span class="p">)</span><span class="w"></span>
<span class="w">        </span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="o">|</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="o">|</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"></span>
<span class="w">        </span><span class="p">.</span><span class="n">sum</span>::<span class="o">&lt;</span><span class="n">Fast</span><span class="o">&lt;</span><span class="kt">f32</span><span class="o">&gt;&gt;</span><span class="p">()</span><span class="w"></span>
<span class="w">        </span><span class="p">.</span><span class="n">into</span><span class="p">()</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>yielding good codegen equivalent to the C baseline (<code>-C target-cpu=skylake</code>):<br>
<a href="/user_uploads/4715/UEXUBJ0Vx6yuLFjfF-YDH1x9/image.png">image.png</a> </p>
<div class="message_inline_image"><a href="/user_uploads/4715/UEXUBJ0Vx6yuLFjfF-YDH1x9/image.png" title="image.png"><img src="/user_uploads/4715/UEXUBJ0Vx6yuLFjfF-YDH1x9/image.png"></a></div><p>Is this something others would want in a crate? What would be a viable strategy to add primitives for<code>reassoc</code> and <code>contract</code> semantics (without all of <code>fast</code>), and is there some roadmap in which the necessary primitives could stabilize?</p>



<a name="217829274"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217829274" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217829274">(Nov 24 2020 at 23:50)</a>:</h4>
<p>A major problem is that <code>fadd_fast</code> cannot be safe right now, and I think we'd prefer these could be used without <code>unsafe</code>.</p>



<a name="217829407"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217829407" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217829407">(Nov 24 2020 at 23:52)</a>:</h4>
<p>But LLVM getting <code>freeze</code> might open doors for that; see <a href="#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/Taking.20advantage.20of.20.60freeze.60.3F/near/212979135">https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/Taking.20advantage.20of.20.60freeze.60.3F/near/212979135</a></p>



<a name="217829448"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217829448" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217829448">(Nov 24 2020 at 23:53)</a>:</h4>
<p><code>reassoc</code> and <code>contract</code>, of course, would be simpler.  I'm unsure whether we could ever make them <code>const</code>, but they seem at least clearly safe.  (As do nsz/arcp, just not nnan/ninf)</p>



<a name="217841383"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217841383" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217841383">(Nov 25 2020 at 03:34)</a>:</h4>
<p>I see there is prior art on the fast float lib: <a href="https://github.com/bluss/fast-floats">https://github.com/bluss/fast-floats</a></p>
<p><span class="user-mention" data-user-id="125270">@scottmcm</span> Do you think <code>fadd_reassoc</code> is the right strategy to expose such functionality in <code>core</code>, and if so, how should it be proposed? (I'm somewhat concerned about all the combinations of flags. Opinionated subsets are appropriate in a library, but perhaps less so in <code>core</code>.) I suppose an alternative would be <code>fadd_with_flags(F, F, Flags) -&gt; F</code>.</p>



<a name="217841521"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217841521" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217841521">(Nov 25 2020 at 03:37)</a>:</h4>
<p>An issue with the likes of <code>reassoc</code> and <code>contract</code> is that expressions can evaluate differently when optimized for different targets. I'd think that precludes making it <code>const</code> since some hosts wouldn't be able to emulate the target. Not having <code>const</code> is fine by me, I just want something practical that can eventually become stable.</p>



<a name="217845338"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845338" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845338">(Nov 25 2020 at 05:14)</a>:</h4>
<p><span class="user-mention" data-user-id="322310">@Jed</span> I don't know what the best way forward would be.  Probably depends greatly on whether the goal is to make a customizable scalar type or just to use it internally to offer some specific things like horizontal sums of the simd types.</p>
<p>Or maybe the only realistic option to be able to always use the SIMD things would be for simd to always be contract+reassoc (or whatever) so that the library can use different orders on different architectures.</p>



<a name="217845350"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845350" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845350">(Nov 25 2020 at 05:14)</a>:</h4>
<p>aside: the <code>fast</code> flag includes all the other flags, in LLVM.  (I backronym it to "finite allowing sketchy transformations".)</p>



<a name="217845515"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845515" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845515">(Nov 25 2020 at 05:18)</a>:</h4>
<p>I personally think the best way forward may be a function attribute (which would allow either a crate or std to implement a wrapper type in the future).  The attribute should probably error on const fns, though I'm not really sure the status of float consts.</p>



<a name="217845542"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845542" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845542">(Nov 25 2020 at 05:19)</a>:</h4>
<p>I'm not sure to what extent this has been discussed before, if at all, but it's probably best to get in touch with compiler people for this. I'm guessing it would need an RFC</p>



<a name="217845549"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845549" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845549">(Nov 25 2020 at 05:19)</a>:</h4>
<p>Nobody's sure of the status of const floats :P</p>
<p>The hard part about the attribute is things like "do you want it to affect <code>Iterator::sum</code>?"</p>



<a name="217845598"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845598" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845598">(Nov 25 2020 at 05:20)</a>:</h4>
<p>I want to make it feasible to write/port more numerical libraries and scientific code. (I'm a maintainer of some parallel algebraic solvers and discretization libraries, currently written in C with various GPU bits.) I'd like to be able to offer more reliable error handling (porting parts of libraries) and users some safer options (catch more misuse at compile-time versus run-time). Being able to write fast kernels is just one part of that; <code>contract</code> and <code>arcp</code> are typically the most useful in user code (implementing material models, etc.) because data is organized for vertical SIMD.</p>



<a name="217845602"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845602" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845602">(Nov 25 2020 at 05:20)</a>:</h4>
<p>Because if the attribute doesn't "infect" function calls, then you still end up needing <code>Iterator::sum_reassoc</code> or whatever.</p>



<a name="217845612"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845612" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845612">(Nov 25 2020 at 05:21)</a>:</h4>
<p>At which point it feels like one should instead have <code>f32reassoc</code></p>



<a name="217845625"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845625" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845625">(Nov 25 2020 at 05:21)</a>:</h4>
<p>I believe the attribute would "infect" them much the way target feature does.  It would change the IR to use reassoc ops instead</p>



<a name="217845691"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845691" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845691">(Nov 25 2020 at 05:22)</a>:</h4>
<p>For the function with the attribute that would be relatively easy, yes.  The problem is if the function calls another function that doesn't have the attribute -- particularly if it's a generic function.</p>



<a name="217845713"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845713" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845713">(Nov 25 2020 at 05:23)</a>:</h4>
<p>(Hence the example of "how do I get a most-efficient-associativity <code>Iterator::sum</code>?")</p>



<a name="217845717"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845717" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845717">(Nov 25 2020 at 05:23)</a>:</h4>
<p>Perhaps it would best be exposed as a compiler intrinsic instead, true, but I believe that still may need to be discussed with compiler people! <span aria-label="big smile" class="emoji emoji-1f604" role="img" title="big smile">:big_smile:</span></p>



<a name="217845719"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845719" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845719">(Nov 25 2020 at 05:23)</a>:</h4>
<p>And maybe the way to solve that is to say it's out of scope, but I'd not certain that's the best answer.</p>



<a name="217845784"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845784" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845784">(Nov 25 2020 at 05:24)</a>:</h4>
<p>FWIW, I <a href="https://github.com/rust-lang/rust/pull/52205">added</a> the <code>nowrap_{add|sub|mul|neg}</code> intrinsics, so I do actually know a bit about this <span aria-label="upside down" class="emoji emoji-1f643" role="img" title="upside down">:upside_down:</span></p>
<p>(Way more than the very little I know about actual simd, certainly.)</p>



<a name="217845961"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217845961" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217845961">(Nov 25 2020 at 05:29)</a>:</h4>
<p>I browsed the code for <code>fadd_fast</code> and the technical part of adding intrinsics looks straightforward. I don't know how to implement attributes that would behave this way, but many of the users I envision would like to apply them at crate granularity (<code>contract</code> and <code>arcp</code>, perhaps <code>reassoc</code>). The types as in <code>.iter().sum::&lt;Reassoc&lt;f32&gt;&gt;()</code> would be acceptable for a lot of library code (where the performance hotspots tend to be more localized and better understood). Less so in user "physics" code where it would create a lot of noise to have explicit conversions.</p>



<a name="217846114"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846114" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846114">(Nov 25 2020 at 05:32)</a>:</h4>
<p>Looking at <span class="user-mention" data-user-id="125270">@scottmcm</span> 's experiences intrinsics may be the way to go and would at least immediately work on nightly</p>



<a name="217846228"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846228" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846228">(Nov 25 2020 at 05:34)</a>:</h4>
<p>SIMD intrinsics would need to be added as well but no reason it couldn't be done</p>



<a name="217846233"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846233" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846233">(Nov 25 2020 at 05:34)</a>:</h4>
<p>As for "infection" of function calls by attributes, that can be delicate because some library code relies on strict semantics and you don't want it behaving incorrectly because it was called from a fp <code>fast</code> scope. Would libraries need to have their own inner attribute scope to protect that sort of sensitive code? Seems not good for backward compatibility.</p>



<a name="217846268"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846268" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846268">(Nov 25 2020 at 05:35)</a>:</h4>
<p>We could experiment with a stable interface in stdsimd but I'd be slightly concerned about overstepping our bounds for something that really should apply to scalars as well</p>



<a name="217846319"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846319" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846319">(Nov 25 2020 at 05:36)</a>:</h4>
<p>Totally agreed about some things not wanting it.  That just pushes more towards a type-based solution, to me.</p>



<a name="217846555"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846555" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846555">(Nov 25 2020 at 05:36)</a>:</h4>
<p>I wonder if this could finally be the thing to finally push custom literals to being worth doing...</p>



<a name="217846562"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846562" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846562">(Nov 25 2020 at 05:37)</a>:</h4>
<p><code>Wrapping&lt;i32&gt;</code> and such really want them too</p>



<a name="217846565"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846565" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846565">(Nov 25 2020 at 05:37)</a>:</h4>
<p>Custom literals?</p>



<a name="217846587"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846587" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846587">(Nov 25 2020 at 05:37)</a>:</h4>
<p>The biggest problem with a <code>BikeShedF32</code> type is that you have to call its constructor for literals all the time.</p>



<a name="217846618"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846618" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846618">(Nov 25 2020 at 05:38)</a>:</h4>
<p>If you could just <code>x * 2.0</code> it'd be so much nicer to use.</p>



<a name="217846650"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846650" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846650">(Nov 25 2020 at 05:38)</a>:</h4>
<p>vs needing <code>x * Wrapping(2)</code> or whatever.</p>



<a name="217846653"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846653" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846653">(Nov 25 2020 at 05:38)</a>:</h4>
<p><code>impl Mul&lt;f32&gt; for Fancy&lt;f32&gt;</code>?</p>



<a name="217846681"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846681" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846681">(Nov 25 2020 at 05:39)</a>:</h4>
<p>Hmm, I wonder if this could be exposed with the different flags as const generics options...</p>



<a name="217846729"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846729" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846729">(Nov 25 2020 at 05:40)</a>:</h4>
<p>The wrapping type infects everything it encounters, and eventually <code>result.into()</code> to get back to <code>f32</code>. At least that's what I did (and then found similar in an existing package).</p>



<a name="217846733"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846733" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846733">(Nov 25 2020 at 05:40)</a>:</h4>
<p>it's a bit weird, but <code>type myf32 = BikeshedF32&lt;{Options { reassoc: true, .. }}&gt;;</code> could work one day...</p>



<a name="217846902"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846902" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846902">(Nov 25 2020 at 05:45)</a>:</h4>
<p>Const generics specialization <span aria-label="smiling devil" class="emoji emoji-1f608" role="img" title="smiling devil">:smiling_devil:</span></p>



<a name="217846917"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846917" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846917">(Nov 25 2020 at 05:45)</a>:</h4>
<p>Is there an idiomatic way to pass <code>&amp;[f32]</code> to functions that work with <code>&amp;[BSF32]</code> (and containers)? Would each function need to convert each argument (zero cost, but visual noise)?</p>



<a name="217846964"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217846964" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217846964">(Nov 25 2020 at 05:46)</a>:</h4>
<p>I think that's the intention of safe transmute</p>



<a name="217847041"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217847041" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217847041">(Nov 25 2020 at 05:48)</a>:</h4>
<p>It's zero cost, but unless I missed something, requires explicit code.</p>



<a name="217854038"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217854038" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217854038">(Nov 25 2020 at 08:02)</a>:</h4>
<p>I would also expect safe transmute to handle that case.  (And until then, it's possible with <code>slice::from_raw_parts</code>, for which we could make a safe wrapper.)  It doesn't seem completely unreasonable to require a marker like that to "opt out" of floating-point determinism.</p>
<p>The hard one is going the other way if there's ninf/nnan involved, since safely getting an ordinary <code>f32</code> from one of those needs <code>freeze</code>, so probably can't just be the pointer cast.</p>



<a name="217947469"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217947469" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217947469">(Nov 25 2020 at 22:39)</a>:</h4>
<p>A common BLAS-level building block is <code>fn axpy(a: f32, x: &amp;[f32], y: &amp;mut [f32])</code>, which computes <code>y[i] += a * x[i]</code>. With ninf/nnan, one would have to check <code>y</code> for this to ever be safe, right? (The cost of doing so more than outweighs any benefit you could get from using <code>fast</code>.)</p>
<p>In practice, <code>contract</code> is all that's useful (performance-wise) for this sort of operation, but the interface concern remains.</p>
<p>I wouldn't have a problem if a new interface and stabilization left out those problematic features.</p>



<a name="217947727"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217947727" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217947727">(Nov 25 2020 at 22:42)</a>:</h4>
<p><span class="user-mention silent" data-user-id="312331">Caleb Zulawski</span> <a href="#narrow/stream/257879-project-portable-simd/topic/dot.20product/near/217845542">said</a>:</p>
<blockquote>
<p>I'm not sure to what extent this has been discussed before, if at all, but it's probably best to get in touch with compiler people for this. I'm guessing it would need an RFC</p>
</blockquote>
<p>Who is most appropriate to reach out to?</p>



<a name="217948224"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217948224" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> oliver <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217948224">(Nov 25 2020 at 22:49)</a>:</h4>
<p>Looking here: <a href="https://github.com/rust-lang/compiler-team/blob/master/content/experts/map.toml">https://github.com/rust-lang/compiler-team/blob/master/content/experts/map.toml</a>, nothing specifically interface related jumps out</p>



<a name="217948381"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217948381" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217948381">(Nov 25 2020 at 22:51)</a>:</h4>
<p>I don't know but <span class="user-mention" data-user-id="204346">@Ashley Mannix</span> may be able to help</p>



<a name="217948505"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217948505" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Caleb Zulawski <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217948505">(Nov 25 2020 at 22:53)</a>:</h4>
<p>This also sounds like something <span class="user-mention" data-user-id="281757">@Jubilee</span> may be interested in</p>



<a name="217950827"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217950827" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217950827">(Nov 25 2020 at 23:26)</a>:</h4>
<p>(Yeah, I'm also interested, although pretty busy at the moment. But I agree that the sanest approach to <code>-ffast-math</code>-style flags probably doesnt have nnan/ninf)</p>



<a name="217951947"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217951947" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217951947">(Nov 25 2020 at 23:44)</a>:</h4>
<p>Thanks, any recommendations on strategy would be very helpful. If the strategy is to add some new <code>fadd_reassoc()</code> or <code>fadd_with_flags()</code>, I think I see how to implement those and understand the consequences. I assume we'd make <code>reassoc</code>, <code>contract</code>, and <code>arcp</code> safe (versus <code>fadd_fast</code>, which is unsafe). I'm not familiar enough with attributes to be of much help there, and I have no sense of the implementation complexity.</p>
<p>I understand there are two steps here: draft an RFC and iterate to approval, then implement. An approved RFC that nobody has time to implement isn't as useful. I don't have lots of time, but can probably push this forward over the coming weeks.</p>



<a name="217952463"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217952463" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217952463">(Nov 25 2020 at 23:53)</a>:</h4>
<p>Hmm. I think adding it unstably as intrinsics in libcore might be an okay first step. An RFC kind of implies a good stable API — and I suspect we wouldn't want to stablize <code>fadd_reassoc</code>/<code>fadd_with_flags</code> etc directly.</p>



<a name="217953121"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217953121" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217953121">(Nov 26 2020 at 00:03)</a>:</h4>
<p>Ah, so just implement the intrinsics in a PR without an RFC? Then we can do wrapper types in a nightly-only library or shoot for some consensus about attributes. If people like the library, then RFC to bring it into libcore?</p>



<a name="217954810"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217954810" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Lokathor <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217954810">(Nov 26 2020 at 00:34)</a>:</h4>
<p>Yeah that's the general flow.</p>
<p>It's "low cost" to get things into nightly. The worst that can happen is that we take it out later and the effort goes to waste.</p>
<p>I think that for dot products specifically it wouldn't be unreasonable to do a method on simd floating types that does the dot product computation, and that method can internally use whatever the heck.</p>



<a name="217955606"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217955606" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217955606">(Nov 26 2020 at 00:49)</a>:</h4>
<p><span class="user-mention" data-user-id="224471">@Lokathor</span> I don't think the use case here is dot products of single-register SIMD variables, it's dot products of large vectors (e.g. &amp;[f32] where len is pretty large).</p>
<p>I think having it for that sort of single-register float var is probably defensible, given the existence of stuff like <code>_mm_dp_ps</code>... although that stuff is very much <em>not</em> magic, and is basically the same speed as what you'd write by hand. (note that there's really only one family of genuinely fast horizontal ops on x86 and it's _mm_sad_epu8 and friends, last I checked)</p>
<p>Anyway, more generally this the fast math ops are not only useful for accelerating dot products, but a wide range of operations.</p>



<a name="217958111"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217958111" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217958111">(Nov 26 2020 at 01:40)</a>:</h4>
<p>The rule for things like intrinsics and <code>-Z</code> flags is that it's up to T-compiler, which generally means they'll allow it if the maintenance burden is low and there's a plausible path for it to become a "real" thing.</p>



<a name="217958204"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217958204" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217958204">(Nov 26 2020 at 01:42)</a>:</h4>
<p>Yeah, the use case is pretty long vectors (and lots of other algorithms -- my interest at present being in material models and PDE solvers, but there are many parallels with ML libraries). I'll try to work up a PR for a <code>fadd_with_flags</code> (only the "safe" flags) as time allows.</p>



<a name="217958373"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217958373" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217958373">(Nov 26 2020 at 01:46)</a>:</h4>
<p>Curiosity: how often do people need some-but-not-all of the safe flags?  When might I want <code>arcp</code> but not <code>contract</code>?</p>



<a name="217958528"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217958528" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217958528">(Nov 26 2020 at 01:50)</a>:</h4>
<blockquote>
<p>and lots of other algorithms -- my interest at present being in material models and PDE solvers, but there are many parallels with ML libraries</p>
</blockquote>
<p>Yeah, I've wanted this before too. When I was more interested in rust for game code I remember looking at the backwards euler solver (modified conjugate gradient) for <a href="https://codepen.io/thomcc/full/NGQpxv">https://codepen.io/thomcc/full/NGQpxv</a> to rust for a soft body/cloth sim library.</p>
<p>I never finished that, or I'd actually have link to the rust code, but it's an example of something <span class="user-mention" data-user-id="224471">@Lokathor</span> might be somewhat interested in that uses a lot of big vectors and matrices (note that a lot of implementations of this sort of thing use forward euler or particle sims, but backwards euler is <em>much</em> more stable in both theory and practice)</p>



<a name="217958703"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217958703" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217958703">(Nov 26 2020 at 01:55)</a>:</h4>
<p><span class="user-mention" data-user-id="125270">@scottmcm</span> Hmm, hard to say. Usually if I wouldn't be okay with some of the flags (for example, sensitive computational geometry or interval artimetic code). I wouldn't be okay with any... And when I'm okay with some I'm okay with all the (safe) ones. That said I think arcp can introduce nans where otherwise you'd have infinity in some edge cases? Not 100% sure I remember.</p>



<a name="217958716"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217958716" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217958716">(Nov 26 2020 at 01:55)</a>:</h4>
<p>Actually, thats probably true more broadly, and not just for arcp</p>
<p>(edit: maybe not — actually this might be the justification for nnan/ninf now that I think about it — so the compiler can do optimizations that can introduce those even if as written it wouldn't. Sadly, it's been long enough that I barely remember this stuff...)</p>



<a name="217959001"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217959001" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217959001">(Nov 26 2020 at 02:02)</a>:</h4>
<p>I think you almost always want <code>contract</code> (it's on by default in gcc). <code>reassoc</code> and <code>arcp</code> do things that were by-hand optimization once upon a time (if we turn them on, formulas can be easier to read by looking more like the papers). <code>reassoc</code> is especially helpful if you lean on high-order functions like <code>fold</code> and relatives.</p>



<a name="217959232"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217959232" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217959232">(Nov 26 2020 at 02:07)</a>:</h4>
<p>Cool that you've done some cloth simulation. When the material gets lower stretch, the implicit solve becomes really important. Backward Euler is too dissipative, but trapezoid/midpoint, BDF2/alpha, and Newmark methods are popular. High-order Gauss or Lobatto methods are also interesting (but solves are complicated). The CG is one ingredient, but for scalability, one needs a multigrid method of some sort. Cloth is quasi-2d and can be solved using sparse Cholesky methods, but 3D problems scale much worse (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><msup><mi>n</mi><mn>2</mn></msup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(n^2)</annotation></semantics></math></span><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">O</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span> in 3d, versus <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><msup><mi>n</mi><mrow><mn>3</mn><mi mathvariant="normal">/</mi><mn>2</mn></mrow></msup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(n^{3/2})</annotation></semantics></math></span><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.138em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">O</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8879999999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">3</span><span class="mord mtight">/</span><span class="mord mtight">2</span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span> in 2d).</p>



<a name="217959332"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217959332" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217959332">(Nov 26 2020 at 02:09)</a>:</h4>
<blockquote>
<p>Backward Euler is too dissipative, but trapezoid/midpoint, BDF2/alpha, and Newmark methods are popular.</p>
</blockquote>
<p>Thanks — I was pretty sure my methods were quite out of date, but wasn't sure what the state of the art here was for it.</p>



<a name="217959576"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217959576" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217959576">(Nov 26 2020 at 02:15)</a>:</h4>
<p><span class="user-mention silent" data-user-id="209168">Thom Chiovoloni</span> <a href="#narrow/stream/257879-project-portable-simd/topic/dot.20product/near/217958716">said</a>:</p>
<blockquote>
<p>Actually, thats probably true more broadly, and not just for arcp</p>
</blockquote>
<p><code>reassoc</code>, at least, definitely can.</p>



<a name="217959715"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/257879-project-portable-simd/topic/dot%20product/near/217959715" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/257879-project-portable-simd/topic/dot.20product.html#217959715">(Nov 26 2020 at 02:18)</a>:</h4>
<p><code>a*a - a*a</code> with <code>contract</code> can produce nonzero. So this is generally true.</p>



<hr><p>Last updated: Aug 07 2021 at 22:04 UTC</p>
</html>